Combining Rules and CRF Learning for Opinion Source Identification in Spanish Texts

نویسندگان

  • Aiala Rosá
  • Dina Wonsever
  • Jean-Luc Minel
چکیده

In this work we present a system for the automatic annotation of opinions in Spanish texts. We focus mainly in the definition of a TFS-style model for the predicates of opinion and their arguments, in the creation of a lexicon of opinion predicates and in two additional variants for identifying the source of opinions. The original system extracts opinions and all its elements (predicate, source, topic and message) based on hand-coded rules, the first variant uses a CRF model for learning the source, assuming that the predicate is already tagged, and the second variant is a combined version, with the result of source recognition via the rule-based system being added as an additional attribute for training the CRF model. We found that this hybrid system performs better than each of the systems evaluated separately. This work involved the construction of several resources for Spanish: a lexicon of opinion predicates, a 13,000 word corpus with whole opinion annotations and a 40,000 word corpus with annotations of opinion predicates and sources.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Opinion Identification in Spanish Texts

We present our work on the identification of opinions and its components: the source, the topic and the message. We describe a rule-based system for which we achieved a recall of 74% and a precision of 94%. Experimentation with machine-learning techniques for the same task is currently underway.

متن کامل

Factuality Annotation and Learning in Spanish Texts

We present a proposal for the annotation of factuality of event mentions in Spanish texts and a free available annotated corpus. Our factuality model aims to capture a pragmatic notion of factuality, trying to reflect a casual reader judgements about the realis / irrealis status of mentioned events. Also, some learning experiments (SVM and CRF) have been held, showing encouraging results.

متن کامل

Automated Rule Selection for Aspect Extraction in Opinion Mining

Aspect extraction aims to extract fine-grained opinion targets from opinion texts. Recent work has shown that the syntactical approach, which employs rules about grammar dependency relations between opinion words and aspects, performs quite well. This approach is highly desirable in practice because it is unsupervised and domain independent. However, the rules need to be carefully selected and ...

متن کامل

Identifying Prepositional Phrases in Chinese Patent Texts with Rule-based and CRF Methods

Identification of prepositional phrases (PP) has been an issue in the field of Natural Language Processing (NLP). In this paper, towards Chinese patent texts, we present a rule-based method and a CRF-based method to identify the PPs. In the rule-based method, according to the special features and expressions of PPs, we manually write targeted formal identification rules; in the CRF approach, af...

متن کامل

Identification of Opinion Holders

Opinion holder identification aims to extract entities that express opinions in sentences. In this paper, opinion holder identification is divided into two subtasks: author’s opinion recognition and opinion holder labeling. Support vector machine (SVM) is adopted to recognize author’s opinions, and conditional random field algorithm (CRF) is utilized to label opinion holders. New features are p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012